智能论文笔记

A Lightweight Reconstruction Network for Surface Defect Inspection

Chao Hu , Jian Yao , Weijie Wu , Weibin Qiu , Liqiang Zhu

分类：计算机视觉 | 机器学习

2022-12-25

Currently, most deep learning methods cannot solve the problem of scarcity of industrial product defect samples and significant differences in characteristics. This paper proposes an unsupervised defect detection algorithm based on a reconstruction network, which is realized using only a large number of easily obtained defect-free sample data. The network includes two parts: image reconstruction and surface defect area detection. The reconstruction network is designed through a fully convolutional autoencoder with a lightweight structure. Only a small number of normal samples are used for training so that the reconstruction network can be A defect-free reconstructed image is generated. A function combining structural loss and $\mathit{L}1$ loss is proposed as the loss function of the reconstruction network to solve the problem of poor detection of irregular texture surface defects. Further, the residual of the reconstructed image and the image to be tested is used as the possible region of the defect, and conventional image operations can realize the location of the fault. The unsupervised defect detection algorithm of the proposed reconstruction network is used on multiple defect image sample sets. Compared with other similar algorithms, the results show that the unsupervised defect detection algorithm of the reconstructed network has strong robustness and accuracy.

translated by 谷歌翻译

Learning Neural Volumetric Field for Point Cloud Geometry Compression

Yueyu Hu , Yao Wang

分类：计算机视觉

2022-12-11

Due to the diverse sparsity, high dimensionality, and large temporal variation of dynamic point clouds, it remains a challenge to design an efficient point cloud compression method. We propose to code the geometry of a given point cloud by learning a neural volumetric field. Instead of representing the entire point cloud using a single overfit network, we divide the entire space into small cubes and represent each non-empty cube by a neural network and an input latent code. The network is shared among all the cubes in a single frame or multiple frames, to exploit the spatial and temporal redundancy. The neural field representation of the point cloud includes the network parameters and all the latent codes, which are generated by using back-propagation over the network parameters and its input. By considering the entropy of the network parameters and the latent codes as well as the distortion between the original and reconstructed cubes in the loss function, we derive a rate-distortion (R-D) optimal representation. Experimental results show that the proposed coding scheme achieves superior R-D performances compared to the octree-based G-PCC, especially when applied to multiple frames of a point cloud video. The code is available at https://github.com/huzi96/NVFPCC/.

translated by 谷歌翻译

Artificial Intelligence Security Competition (AISC)

Yinpeng Dong , Peng Chen , Senyou Deng , Lianji L , Yi Sun , Hanyu Zhao , Jiaxing Li , Yunteng Tan , Xinyu Liu , Yangyi Dong

分类：人工智能 | 计算机视觉 | 机器学习

2022-12-07

The security of artificial intelligence (AI) is an important research area towards safe, reliable, and trustworthy AI systems. To accelerate the research on AI security, the Artificial Intelligence Security Competition (AISC) was organized by the Zhongguancun Laboratory, China Industrial Control Systems Cyber Emergency Response Team, Institute for Artificial Intelligence, Tsinghua University, and RealAI as part of the Zhongguancun International Frontier Technology Innovation Competition (https://www.zgc-aisc.com/en). The competition consists of three tracks, including Deepfake Security Competition, Autonomous Driving Security Competition, and Face Recognition Security Competition. This report will introduce the competition rules of these three tracks and the solutions of top-ranking teams in each track.

translated by 谷歌翻译

Attention Spiking Neural Networks

Man Yao , Guangshe Zhao , Hengyu Zhang , Yifan Hu , Lei Deng , Yonghong Tian , Bo Xu , Guoqi Li

分类：计算机视觉

2022-09-28

从大脑的事件驱动和稀疏的尖峰特征中受益，尖峰神经网络（SNN）已成为人工神经网络（ANN）的一种节能替代品。但是，SNNS和ANN之间的性能差距很长一段时间以来一直在延伸SNNS。为了利用SNN的全部潜力，我们研究了SNN中注意机制的影响。我们首先使用插件套件提出了我们的注意力，称为多维关注（MA）。然后，提出了一种新的注意力SNN体系结构，并提出了端到端训练，称为“ ma-snn”，该体系结构分别或同时或同时延伸了沿时间，通道以及空间维度的注意力重量。基于现有的神经科学理论，我们利用注意力重量来优化膜电位，进而以数据依赖性方式调节尖峰响应。 MA以可忽略的其他参数为代价，促进了香草SNN，以实现更稀疏的尖峰活动，更好的性能和能源效率。实验是在基于事件的DVS128手势/步态动作识别和Imagenet-1K图像分类中进行的。在手势/步态上，尖峰计数减少了84.9％/81.6％，任务准确性和能源效率提高了5.9％/4.7％和3.4 $ \ times $/3.2 $ \ times $。在ImagEnet-1K上，我们在单个/4步res-SNN-104上获得了75.92％和77.08％的TOP-1精度，这是SNN的最新结果。据我们所知，这是SNN社区与大规模数据集中的ANN相比，SNN社区取得了可比甚至更好的性能。我们的工作阐明了SNN作为支持SNN的各种应用程序的一般骨干的潜力，在有效性和效率之间取得了巨大平衡。

translated by 谷歌翻译

Scale Attention for Learning Deep Face Representation: A Study Against Visual Scale Variation

Hailin Shi , Hang Du , Yibo Hu , Jun Wang , Dan Zeng , Ting Yao

分类：计算机视觉

2022-09-19

人脸图像通常以广泛的视觉量表出现。现有的面部表示通过组装有限系列的预定尺度的多尺度方案来追求处理量表变化的带宽。这种多弹药方案带来了推理负担，而预定义的量表不可避免地从真实数据中差异。取而代之的是，从数据中学习比例参数，并将其用于单发功能推理是一个不错的解决方案。为此，我们通过诉诸规模空间理论并实现两倍的设施来改革Conv层：1）Conv层从真实数据分布中学习一组尺度，每个数据分布都由Conv内核来实现； 2）该图层自动在适当的通道和位置上突出显示与输入模式量表及其存在相对应的位置。然后，我们通过堆叠改革层的层来实现分层尺度的关注，建立一种名为“比例尺注意Cons Neurnet网络”（\ textbf {scan-cnn}）的新颖风格。我们将扫描CNN应用于面部识别任务，并推动SOTA性能的前沿。当面部图像模糊时，准确性增长更为明显。同时，作为单发方案，该推断比多弹性融合更有效。与普通CNN相比，制造了一组工具，以确保对扫描CNN进行快速训练和推理成本的零增加。

translated by 谷歌翻译

Alexa, Let's Work Together: Introducing the First Alexa Prize TaskBot Challenge on Conversational Task Assistance

Anna Gottardi , Osman Ipek , Giuseppe Castellucci , Shui Hu , Lavina Vaz , Yao Lu , Anju Khatri , Anjali Chadha , Desheng Zhang , Sattvik Sahai

分类：自然语言处理 | 人工智能

2022-09-13

自2016年成立以来，Alexa奖计划使数百名大学生能够通过Socialbot Grand Challenge探索和竞争以发展对话代理商。挑战的目的是建立能够与人类在流行主题上连贯而诱人的代理人20分钟，同时达到至少4.0/5.0的平均评分。但是，由于对话代理商试图帮助用户完成日益复杂的任务，因此需要新的对话AI技术和评估平台。成立于2021年的Alexa奖Taskbot Challenge建立在Socialbot Challenge的成功基础上，通过引入交互式协助人类进行现实世界烹饪和做自己动手做的任务的要求，同时同时使用语音和视觉方式。这项挑战要求TaskBots识别和理解用户的需求，识别和集成任务和域知识，并开发新的方式，不分散用户的注意力，而不必分散他们的任务，以及其他挑战。本文概述了Taskbot挑战赛，描述了使用Cobot Toolkit提供给团队提供的基础架构支持，并总结了参与团队以克服研究挑战所采取的方法。最后，它分析了比赛第一年的竞争任务机器人的性能。

translated by 谷歌翻译

Latent Heterogeneous Graph Network for Incomplete Multi-View Learning

Pengfei Zhu , Xinjie Yao , Yu Wang , Meng Cao , Binyuan Hui , Shuai Zhao , Qinghua Hu

分类：机器学习 | 计算机视觉

2022-08-29

近年来，多视图学习迅速发展。尽管许多先前的研究都认为每个实例都出现在所有视图中，但在现实世界应用程序中很常见，从某些视图中丢失实例，从而导致多视图数据不完整。为了解决这个问题，我们提出了一个新型潜在的异质图网络（LHGN），以实现不完整的多视图学习，该学习旨在以灵活的方式尽可能充分地使用多个不完整的视图。通过学习统一的潜在代表，隐含地实现了不同观点之间一致性和互补性之间的权衡。为了探索样本与潜在表示之间的复杂关系，首次提出了邻域约束和视图约束，以构建异质图。最后，为了避免训练和测试阶段之间的任何不一致之处，基于图形学习的分类任务应用了转导学习技术。对现实世界数据集的广泛实验结果证明了我们模型对现有最新方法的有效性。

translated by 谷歌翻译

Vector Quantized Diffusion Model with CodeUnet for Text-to-Sign Pose Sequences Generation

Pan Xie , Qipeng Zhang , Zexian Li , Hao Tang , Yao Du , Xiaohui Hu

分类：计算机视觉

2022-08-19

手语制作（SLP）旨在将口语语言自动转化为符号序列。 SLP的核心过程是将符号光泽序列转换为其相应的标志姿势序列（G2P）。大多数现有的G2P模型通常以自回归方式执行这种条件的远程生成，这不可避免地导致错误的积累。为了解决这个问题，我们提出了一种量化量子序列序列的生成的矢量量化扩散方法，称为poseVQ扩散，这是一种迭代性非自动入学方法。具体而言，我们首先引入量化量化变量自动编码器（姿势VQVAE）模型，以表示姿势序列作为一系列潜在代码。然后，我们通过最近开发的扩散体系结构的扩展来对潜在离散空间进行建模。为了更好地利用时空信息，我们介绍了一种新颖的体系结构，即CodeUnet，以在离散空间中生成更高质量的姿势序列。此外，利用学习的代码，我们开发了一种新型的顺序k-nearest-neighbours方法，以预测相应的光泽序列的姿势序列的可变长度。因此，与自回旋G2P模型相比，我们的模型具有更快的采样速度，并产生明显更好的结果。与以前的非自动入学G2P方法相比，PoseVQ扩散通过迭代改进改善了预测的结果，从而在SLP评估基准上获得了最新的结果。

translated by 谷歌翻译

Pandemic Control, Game Theory and Machine Learning

Yao Xuan , Robert Balkin , Jiequn Han , Ruimeng Hu , Hector D. Ceniceros

分类：机器学习

2022-08-18

游戏理论一直是控制疾病传播并提出个人和地区级别最佳政策的有效工具。在此AMS通知文章中，我们关注Covid-19的干预的决策制定，旨在提供数学模型和有效的机器学习方法，以及对过去实施的相关政策的理由，并如何解释当局如何解释当局从游戏理论的角度来看，决策会影响其邻近地区。

translated by 谷歌翻译

Bridging the Gap Between Training and Inference of Bayesian Controllable Language Models

Han Liu , Bingning Wang , Ting Yao , Haijin Liang , Jianjin Xu , Xiaolin Hu

分类：自然语言处理 | 人工智能

2022-06-11

大规模的预训练的语言模型在自然语言生成任务上取得了巨大的成功。但是，很难控制预先训练的语言模型来生成具有所需属性的句子，例如主题和情感等。最近，贝叶斯可控的语言模型（BCLM）已被证明在可控制的语言生成中有效。 BCLM并没有微调预训练的语言模型的参数，而是使用外部歧视器来指导预训练的语言模型的生成。但是，BCLMS训练和推断之间的不匹配限制了模型的性能。为了解决这个问题，在这项工作中，我们为可控语言生成提出了一个“双子座歧视者”，以减轻小计算成本的不匹配问题。我们在两个可控的语言生成任务上测试了我们的方法：情感控制和主题控制。在这两项任务上，我们的方法都达到了新的最先进的结果，从而可以自动评估。

translated by 谷歌翻译